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The clustering coefficient quantifies liow well connected are the neighbors of a vertex in a graph. In 
real networks it decreases with the vertex degree, which has been taken as a signature of the network 
hierarchical structure. Here we show that this signature of hierarchical structure is a consequence 
of degree correlation biases in the clustering coefficient definition. We introduce a new definition in 
which the degree correlation biases are filtered out, and provide evidence that in real networks the 
clustering coefficient is constant or decays logarithmically with vertex degree. 



The increasing availability of network data represent- 
ing many real systems have motivated the development 
of new statistical measures to characterize large networks 
h^^^d^M- These measures has revealed that, as a dif- 
ference with the classical Erdos Renyi random graph 
model, real networks are characterized by a power law 



0, , a high clustering 
and degree correlations 
111 [12. Yet, it is impor- 



distribution of vertex degr_ees_ 
coefficient or transitivity 
between connected vertices [H 
tant to characterize up to which extent the new measures 
provides new information about the studied networks. 
For instance, it has been shown that in some networks 
the degree correlations are a consequence of the existence 
of large degree vertices and, therefore, the sequence of 
vertex degrees is sufficient to characterize those networks 

In this work we study the influence of degree correla- 
tions on the clustering coefRcient. We show that most of 
the observed variations of the clustering coefficient with 
the vertex degrees 0, 0, 0, are determined by the 
degree correlations among connected vertices. Based on 
this fact, we introduce a new definition of clustering coef- 
ficient, filtering out the effect of degree correlations. The 
similarities and differences between the two definitions 
are analyzed through the study of different real networks. 

Consider undirected simple graphs on i = 1 , . . . , ver- 
tices. Let ki be the degree of a vertex and ti the number 
of edges among its neighbors. The standard definition of 
local clustering coefficient is 
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where C^') is the number of pairs that can be made us- 
ing ki neighbors. Furthermore, to characterize the global 
clustering coefficient two different measures has been in- 
troduced. The first is just the average of Ci over all ver- 
tices with degree larger than one 
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The second is obtained computing first the average of t 




N-2 



FIG. 1: Double star with two vertices, 1 and 2, connected to 
N — 2 other vertices. The neighbors of vertex 1 (or 2) are 
connected as most as their degrees allow. Yet, with the old 
definition of clustering coefficient we obtain ci = 0{1/N), 
approaching zero in the limit iV 2> 1. 
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As noticed in [1^ , the two definitions of global clustering 
coefficient may give different values. Consider, for in- 
stance, a double star of A^ vertices (Fig. In this case 
(c) « 1, C = 0{1/N) and the two global clustering coef- 
ficients dramatically differ for A^ 3> 1. The limitations in 
the clustering coefficient definition are not only related 
to the way averages are computed. The local clustering 
coefficient of any of the two central vertices of the double 
star, vertex 1 for instance, is Ci = 0{1/N), approach- 
ing zero for A^ ^ 1. We cannot, however, increase the 
number of connections among the neighbors of vertex 1 
without increasing the degree of its neighbors. In this 
sense, the neighbors of vertex 1 are as clustered as they 
can be, in contradiction with the small value of ci. 

This example shows that the local clustering coefficient 
of a large degree vertex connected to vertices with much 
smaller degrees will be always small, no matter how its 
neighbors are interconnected. We would like instead a 
measure of clustering coefficient that allow us to quan- 
tify the connectivity among the neighbors of a vertex, 
independently of its degree and the degree of its neigh- 
bors. The clustering coefficient is a three vertex correla- 
tion measure and, as it is the general case in statistics, to 
define a three point correlation measure we should filter 
out two point correlations, represented here by the de- 
gree correlations between connected vertices. We tackle 
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FIG. 2: Algorithm to compute uji. (a) A vertex i (open 
circle) is connected to five neighbors (filled circles) with de- 
gree sequence {8,7,2,2,2}. (b) Since each neighbor can be 
connected at most with four other neighbors, we replace the 
neighbors degree sequence (lowest raw) by {4, 4, 1, 1, 1} (mid- 
dle raw) . It is easy to see that after connecting the first neigh- 
bor to all others, we get 4 triangles and 3 extra edges that 
can't be used anymore (upper raw). Summarizing, for this ex- 
ample, oji = 4, f2i = 5 and (g) = 10. (c) Subgraph with max- 
imum number of edges among the neighbors, with Ci — 0.4 
and Ci = 1. 
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Internet 


-0.19 


0.45 


0.0090 


0.49 
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protein interaction 


-0.13 


0.12 
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0.16 


0.19 


semantic 


0.085 


0.75 


0.31 


0.83 


0.59 


co-authorship 


0.67 


0.65 


0.56 


0.78 


0.85 



TABLE I: Average clustering coefficient as computed with the 
old and new definitions. The graphs are listed in increasing 
order of their degree of assortativity, quantified by the de- 
gree correlation coefficient r [ll|. taking values from -1 (fully 
disassortative) to 1 (fully assortative) . 
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this problem defining the clustering coefficient relative 
to the maximum possible number of edges between the 
neighbors of a vertex, given their degree sequence. Let 
LUi be the maximum number of edges that can be drawn 
among the ki neighbors of a vertex i, given the degree 
sequence of its neighbors. A neighbor j can have at most 
iiim(ki — 1, kj — 1) edges with the other neighbors, there- 
fore 
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neighbors 
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While (2*) takes into account only the degree of the ver- 
tex, considers that occasionally, not all the fc^ — 1 
excess edges are available at the neighbors of i. uji con- 
siders, in addition, the possibility of the excess edges to 
actually form trian gles . uJi can be computed using the 
following algorithm \2u\: 1- Starting from the neighbor's 
degree sequence {fci, . . . , fc„} (n = ki), construct the list 
{min(fci, ki) — 1, . . . ,min(fci, fc„) — 1}, arranged in a de- 
creasing order. 2- Draw an edge from the first element to 
as many as possible other elements in the list, always go- 
ing from largest to smaller. Each time an edge is drawn, 
one is subtracted from the remaining degree of the con- 
nected vertices. 3 - Remove the first element and any 
zero from the list and sort the list in decreasing order. 4- 
Repeat the process and stop when the list is empty. The 
number of maximum possible connections Ui is the total 
number of edges drawn (see Fig. |2J|. 

A proper definition of local clustering coefficient, re- 
moving the effects of degree correlations, is 



(5) 



and the two different measures of global clustering coef- 



Some general properties of the new definition of cluster- 
ing coefficient are the following, (j) If all the neighbors 
of a vertex has degree one (star) then its clustering co- 
efficient is undefined. Indeed, the concept of clustering 
is meaningless for the central vertex of a star, as it is 
meaningless for degree one vertices, {ii} Ci > Ci, as fol- 
lows from 10} . Therefore, when the clustering is one by 
the old definition it is one by the new definition. Notice 
that the opposite is not necessarily true (see Fig. |2Ic)). 
{Hi) When all the ki neighbors of a vertex i have degrees 
larger or equal to the degree of the vertex itself (a regular 
graph, for instance) Ci = Ci. 

The example in Fig. |2 shows how the old definition 
underestimates the clustering around a given vertex i. 
In this case, the number of edges between neighbors is 
as large as it can be given their degree sequence. This 
picture is not captured by the clustering coefficient ac- 
cording to the old definition {ci = 0.4), but it is cor- 
rectly quantified using the new definition (ci = 1). In 
the following we compare the old and new clustering 
coefficient definitions using the graph representation of 
four real systems. The degree of correlations present on 
these graphs is quantified by the assortativity coefficient 
r taking values between -1 (highly disassortative) to 
1 (highly assortative). The systems considered are, in in- 
creasing order of assortativity, 1- the autonomous system 
representation of the Internet, as for April 2001 |23|, 2- 
the protein-protein interaction network of the yeast Sac- 
charomyces cerevisiae [23|, 3- the semantic web of En- 
glish synonyms and 4- the co-authorship network of 
mathematical publications between 1991 and 1999 [23l |. 
In Tabled we show the two global clustering coeflticients 
as computed with the old and new definitions. For the 
two disassortative graphs (r < 0), there is an orders of 
magnitude difference between the global clustering co- 
efficients (c) and C computed with the old definition. 
With the new definition, however, both global measures 
of clustering coefficient © gives values of the same order, 
independently of the degree correlations. 



Another characteristic feature of the old definition of 
clustering coefficient is that, when the average is re- 
stricted to vertices with the same degree (c), , it decays 
as (c)^ ~ A;-" with vertex degree [H El lii 111 . This 
decay can be observed in Fig. |21for the four graphs con- 
sidered here, being more pronounced for the two disassor- 
tative graphs in Fig. 01 a) and (b), and almost absent for 
the highly assortative co-authorship graph in Fig. Old). 
In contrast, when computed with the new definition jSJ, 
(c)j, does not exhibit a strong variation with increasing 
vertex degree (see Fig. . 

In particular, the decreasing trend is completely ab- 
sent for the Internet (Fig. ISJa)), and the variations be- 
tween the smallest and largest new clustering coefficient 
are no more than a factor of two, indicating variations 
previously observed with the standard definition |l5l | are 
refiecting degree correlations. The large variations of 
(c)j, with the vertex degree k have been interpreted as 
the existence of a hierarchical structure, with high de- 
gree vertices interconnecting highly connected subgraphs 
made of smaller degree vertices, but with no or few con- 
nections among vertices in different subgraphs ,15^ il6| . 
The existence of this hierarchical structure, however, was 
already predicted from the analysis of the degree correla- 
tions 1^ [Tol l . The present work make the bridge between 
these two differences approaches to quantify the hierar- 
chical structure of the Internet, showing that the varia- 
tions in the clustering coefficient with the vertex degrees, 
as measured with the old definition, are just reflecting the 
existence of degree correlations. These conclusions are 
also applicable for the protein-protein interaction graph, 
with a degree of disassortative close to that of the Inter- 
net graph. 

In the case of the Internet we can also follow changes 
in the clustering coefficient as the network evolves, with 
around 3000 vertices in 1997 to 10000 vertices in 2001. 
(c}j. remains essentially stationary within this period 
(data not shown), as does (c)^, 15|. In contrast, in ran- 
dom graphs with fixed degree distribution and degree cor- 
relations the local clustering coefficient approaches zero 
with increasing graph size, independently of the vertex 
degree Therefore, the Internet is more clustered 

than expected from the degree distribution and degree 
correlations alone. 

In the case of the semantic web (Fig. EIc)), although 
the clustering coefficient variations are reduced after fil- 
tering out the degree correlations, there is still a logarith- 
mic decrease with increasing the vertex degree (see inset 
of Fig. Ofc)). Using a deterministic growing graph model 
introduced in 25], we show that this logarithmic decay 
may be the general case for graphs where (c)j, ~ 1/fc. In 
the deterministic model, we start with one edge at time 
t — —1. At each time step we create a new triangle on 
each existing edge by connecting its two endpoints to a 
new vertex. At time t — we get one triangle and at 
time t = 1, we will have the triangle from the previous 
step and three new ones, each is using one edge from the 
existing old triangle and two new edges with a new ver- 
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(b) protein interaction 
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FIG. 3: Average clustering as a function of the vertex degree, 
as computed using the old definition (circles), the new defini- 
tion approximating uJi by Hi (squares) and the new definition 
using LUi (triangles). The graphs are shown in increasing or- 
der of their assort at ivity, with the most disassortative graph 
in the top, and the more assortative graph on the bottom. 
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tex between. Since this model is built recursively, we can 
find by induction the degree of a vertex ^^(t) = 2"^+^ and 
the number of triangles that passing trough it ti = ki — 1, 
where r is the time elapsed from the introduction of the 
vertex, resulting the clustering coefficient Ci ~ 2/ki [25l |. 
To compute the clustering coefficient according to the 
new definition ((SJ we need to determine the scaling of uji 
with the vertex degree ki. From the il^ definition Q and 
the evolution rules of the model we obtain the following 
recursive relation ^^(t + 1) = 2^li{T) + T'^^ . From this 
recursive relation and the initial condition rii(O) = 1 we 
obtain by induction ^lij) — (r + 1)2'^. We have also ob- 
tained an exact expression for loi (2Qj] , which in the r ^ 1 
limit results in « fij and 



The analysis of the deterministic model indicates that 
in graphs where the old definition of clustering coeffi- 
cient is characterized by an inverse proportionality with 
the vertex degree, the new clustering coefficient will ex- 
hibit a logarithmic decrease with increasing the vertex 
degree. This observation is in agreement with the se- 



mantic web data as well (Fig|31:), where (c)^, ^ Xjk and 
(c)fe ~ 1/logfc. 

Finally, for the most assortative graph in Fig. Old), we 
do not observe a substantial difference between the two 
definitions of clustering coefficient. This observation is 
explained by the fact that in a highly assortative graph 
the degree of connected vertices is quite similar, loi ~ 
fli « (2') and the two clustering coefficient definitions 
give similar results. 

We have shown evidence that the standard definition 
of clustering coefficient in Eq. contains some biases 
due to the degree correlations between connected ver- 
tices. After removing these biases the local clustering 
coefficient does not depend strongly on the vertex de- 
grees, being of the same order for small and large degree 
vertices. More precisely, we observe two different scenar- 
ios, either the local clustering coefficient is approximately 
constant or it decays logarithmically with increasing the 
vertex degree. These results will eventually force us to 
reevaluate the clustering based analysis of complex net- 
works, and other approaches p^l2a.l27ll2a | based on this 
magnitude. 

We thank A.-L. Barabasi and A. Vespignani for helpful 
comments and discussion. 



R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74, 47 [16 
(2001). 

S. N. Dorogovtsev and J. F. F. Mendes., Adv. Phys. 51, [17 
1079 (2002). 

M. E. Newman, SIAM Rev. 45, 167 (2003). [18 
S. Bornholdt and H. G. Schuster, eds.. Handbook of [19 
Graphs and Networks: From the Genome to the Inter- 
net (Wiley- VCH, Weinheim, 2003). 

R. Pastor-Satorras and A. vespignani. Evolution and 
structure of the Internet: A Statistical Physics approach [20 
(Cambridge University Press, Cambridge, 2004). [21 
P. Erdos and A. Renyi, Publications Mathematicaa 6, 
290 (1959). 

M. Faloutsos, P. Faloutsos, and C. Faloutsos, Comput. [22 
Commun. Rev. 29, 251 (1999). 

A.-L. Barabasi, R. Albert, H. Jeong, and G. Bianconi, [23 
Science 287, 2115a (2000). 

D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998). [24 
R. Pastor-Satorras, A. Vazquez, and A. Vespignani, [25 
Phys. Rev. Lett. 87, 258701 (2001). 

M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002). [26 
S. Maslov and K. Sneppen, Science 296, 910 (2002). [27 
J. Park and M. E. J. Newman, Phys. Rev. E 68, 026112 
(2003). 

M. Catanzaro, M. B. na, and R. Pastor-Satorras, [28] 
arXive cond-mat/0408110 

A. Vazquez, R. Pastor-Satorras, and A. Vespignani, 
Phys. Rev. E 65, 066130 (2002). 



E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, 

and A.-L. Barabasi, Science 297, 1551 (2002). 

E. Ravasz and A.-L. Barabasi, Phys. Rev. E p. 026112 

(2003). 

A. Vazquez, Phys. Rev. E 67, 056104 (2003). 

B. BoUobas and O. M. Riordan, in Handbook of Graphs 
and Networks: From the Genome to the Internet, edited 
by S. Bornholdt and H. G. Schuster (Wiley- VCH, Wein- 
heim, 2003), pp. 1-34. 

S. N. Soffer and A. Vazquez, unpublished. 
The national laboratory for applied network re- 
search (NLANR), national science foundation, 
http:/ /moat, nlanr. net/ 

The Database of Intera cting Proteins (DIP), 
ht t p : / / dip . doe-mbi . u cla. edu/ 1 

Barabasi, H. Jeong, Z.Neda, E. Ravasz, A. 



Schu- 



A.-L 

bert, and T. Vicsek, Physica A 311, 590 (2002). 

S. N. Dorogovtsev, arXiv:cond-mat/0308444. 

S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes, 

Phys. Rev. E 65, 066122 (2002). 

M. E. J. Newman, Phys. Rev. E 68, 026121 (2003). 

A. Barrat, M. Barthelemy, R. Pastor-Satorras, and 

A. Vespignani, Proc. Natl. Acad. Sci. USA 101, 3747 

(2004). 

A. Vazquez, R. Dobrin, D. Sergi, J.-P. Eckmann, Z. Olt- 
vai, and A.-L. Barabasi, arXiv:cond-mat/0408431. 



